-
Couldn't load subscription status.
- Fork 71
Implement list_transpose()
#2059
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I know that # Makes 100,000 size 3 list elements
x <- list(
sample(1e5),
sample(1e5),
sample(1e5)
)
bench::mark(
list_transpose(x),
purrr::list_transpose(x)
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 list_transpose(x) 4.85ms 5.46ms 176. 3.43MB 0
#> 2 purrr::list_transpose(x) 4.03s 4.03s 0.248 32.8MB 1.24
# Makes 10 million size 3 list elements
x <- list(
sample(1e7),
sample(1e7),
sample(1e7)
)
# Takes way too long with purrr
bench::mark(list_transpose(x), iterations = 10)
#> # A tibble: 1 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 list_transpose(x) 486ms 488ms 2.05 343MB 9.22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Approving but I would expect the ptype of the outer list to be preserved, for instance with list_of(). If you disagree, I'd like to talk this over together before merging.
R/list-transpose.R
Outdated
| # Disallow `NULL` entirely. These would break `vec_size()` invariants of | ||
| # `list_transpose()` if we simply drop them via `list_interleave()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think an argument could be made for coercing to an unspecified vector, but I don't mind the stricter behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought long and hard about this today and decided to add a null argument
- By default, we error on
NULLelements - If
nullis supplied, it replaces everyNULLelement withnull- It must be size 1
- It participates in common type determination alongside the elements of
xandptype
The thing that took me the longest to figure out was the restriction that null must be size 1 (as opposed to recycling alongside all of the elements of x and size). Without that, I kept running into various scenarios that were hard to explain and didn't make any sense. I have added some tests that call these out to remind us of why we have this restriction. In the end I think it makes sense, you can think of it like:
nullallows you to treatNULLas a size 1 element of some type
Which ends up being what you want for any use case I can think of.
library(vctrs)
# Error by default
try(list_transpose(list(1, NULL, 2:3, NULL)))
#> Error in list_transpose(list(1, NULL, 2:3, NULL)) :
#> `list(1, NULL, 2:3, NULL)[[2]]` must be a vector, not `NULL`.
list_transpose(list(1, NULL, 2:3, NULL), null = NA)
#> [[1]]
#> [1] 1 NA 2 NA
#>
#> [[2]]
#> [1] 1 NA 3 NA
list_transpose(list(1, NULL, 2:3, NULL), null = 0L)
#> [[1]]
#> [1] 1 0 2 0
#>
#> [[2]]
#> [1] 1 0 3 0
# Invariants on `null`:
# Must be size 1!
try(list_transpose(list(1, NULL, 2:3), null = 1:2))
#> Error in list_transpose(list(1, NULL, 2:3), null = 1:2) :
#> `null` must have size 1, not size 2.
# Does contribute to common type!
typeof(list_transpose(list(1L, NULL, 2L), null = 3)[[1]])
#> [1] "double"
# Some edge cases:
# With empty list
list_transpose(list(), null = 0L)
#> list()
# Input:
# - List size 0
# - Element size 2 (supplied)
# - Element type integer (from `null`)
# Output:
# - List size 2
# - Element size 0
# - Element type integer
list_transpose(list(), null = 0L, size = 2)
#> [[1]]
#> integer(0)
#>
#> [[2]]
#> integer(0)
# Input:
# - List size 2
# - Element size 1 (`NULL` treated as size 1 when `null` is supplied)
# - Element type logical (from `null`)
# Output:
# - List size 1
# - Element size 2
# - Element type logical
list_transpose(list(NULL, NULL), null = NA)
#> [[1]]
#> [1] NA NAThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'm curious what are the problematic cases with null a size n vector, where n is the result of common size determination? Do these occur when null is taken into account for the common size?
0ff20ff to
d9b0bc2
Compare
R/list-transpose.R
Outdated
| if (is.object(x)) { | ||
| # The list input type should not affect the transposition process in any | ||
| # way. In particular, supplying a list subclass that doesn't have a | ||
| # `vec_cast.subclass.list` method shouldn't prevent the insertion of | ||
| # `list(null)` before the transposition. The fact that we must insert | ||
| # `list(null)` should be considered an internal detail. | ||
| x <- unclass(x) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tests for this, but basically I want this to work
x <- structure(list(1, NULL), class = c("my_list", "list"))
list_transpose(x, null = 0)But it doesn't due to null handling because we have to do this
vec_assign(x, vec_detect_missing(x), list(0))
#> Error in `vec_assign()`:
#> ! Can't convert <list> to <my_list>.which doesn't work unless you bypass the class
We use obj_check_list() to check that it's a list, and after that we don't care about the class type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that suggests that a "list" class should have a coercion method from list?
R/list-transpose.R
Outdated
| # Computes the `ptype` incorporating both `x` and `null` | ||
| # | ||
| # Like `ptype_finalize()` in `vec_recode_values()` and `vec_if_else()` | ||
| list_transpose_ptype_common <- function( | ||
| x, | ||
| null, | ||
| ptype, | ||
| x_arg, | ||
| error_call | ||
| ) { | ||
| if (!is_null(ptype)) { | ||
| # Validate and return user specified `ptype` | ||
| ptype <- vec_ptype(ptype, x_arg = "ptype", call = error_call) | ||
| return(vec_ptype_finalise(ptype)) | ||
| } | ||
|
|
||
| # Compute from `x` | ||
| ptype <- vec_ptype_common(!!!x, .arg = x_arg, .call = error_call) | ||
|
|
||
| if (!is_null(null)) { | ||
| # Layer in `null` | ||
| ptype <- vec_ptype2( | ||
| x = null, | ||
| y = ptype, | ||
| x_arg = "null", | ||
| y_arg = "", | ||
| call = error_call | ||
| ) | ||
| } | ||
|
|
||
| ptype | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same trick used in vec_recode_values() and vec_if_else()
Basically I really want to avoid vec_ptype_common(!!!x, null = null) because that materializes a new list, and if x is really long then that is needlessly expensive.
tests/testthat/test-list-transpose.R
Outdated
| test_that("`null` can't result in recycle to size 0", { | ||
| # This is one reason we force `null` to be size 1. | ||
| # If it participated in common size determination it would result in `list()` | ||
| # by forcing the elements to recycle to size 0 first. | ||
| x <- list(1L, 2L) | ||
| expect_snapshot(error = TRUE, { | ||
| list_transpose(x, null = integer()) | ||
| }) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reason why null is size 1
tests/testthat/test-list-transpose.R
Outdated
| # This is one reason we force `null` to be size 1. | ||
| # If it participated in common size determination, it would result in an | ||
| # element size of 2, and then an output list size of 2, giving us | ||
| # `list(integer(), integer())` which would be very odd. | ||
| # | ||
| # Input | ||
| # - List size 0 | ||
| # - Element size 0 (inferred from list) | ||
| # - Element type integer (inferred from `null`) | ||
| expect_snapshot(error = TRUE, { | ||
| list_transpose(list(), null = 1:2) | ||
| }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A reason why null is size 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss the list type preservation issue.
R/list-transpose.R
Outdated
| # Disallow `NULL` entirely. These would break `vec_size()` invariants of | ||
| # `list_transpose()` if we simply drop them via `list_interleave()`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I'm curious what are the problematic cases with null a size n vector, where n is the result of common size determination? Do these occur when null is taken into account for the common size?
R/list-transpose.R
Outdated
| if (is.object(x)) { | ||
| # The list input type should not affect the transposition process in any | ||
| # way. In particular, supplying a list subclass that doesn't have a | ||
| # `vec_cast.subclass.list` method shouldn't prevent the insertion of | ||
| # `list(null)` before the transposition. The fact that we must insert | ||
| # `list(null)` should be considered an internal detail. | ||
| x <- unclass(x) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that suggests that a "list" class should have a coercion method from list?
536209a to
fbda296
Compare
|
@lionel- and I created a nice plan from this that we both are on board with
If we can do that, we can say that When size or ptype information is provided by the list itself, we use that. When size or ptype information is missing, we compute it as the common type and common size of the inputs, HOWEVER, we require that the list have at least 1 non- It would return a I think we'd remove the The We'd be treating |
Closes #2055
This feels quite useful to me, and took enough time thinking about the edge cases that I feel like it deserves its own function, even if it is quite a short wrapper around
list_interleave()andvec_chop()at this point.Caution
I'm a little worried about the name clashing with
purrr::list_transpose(), but I'm not entirely sure what to do about that. It has different invariants than the purrr function, and I think it is a little closer to purer vctrs principles.For the problem of
list(integer(), integer())that we talked about over slack, I've realized that the correct thing for vctrs to do is to expose bothsizeandptypearguments. These control "the expected size and type of each element in the list".I think these very elegantly let you recover the original object when you need to double-transpose in a generic way and have to worry about retaining the original size and type, even in the empty object case.
I think this is better than trying out some new
rectangular_list_ofobject. I know it means that the object itself can't carry around the size and type info, but hopefully that's okay in terms of how people might do a double-transpose.The
sizeargument is also nice when you have an expected element size, but each element is recyclable to that expected size and you happen to have all size 1 elements